智能论文笔记

CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methods

Giovanni Menegozzo , Diego Dall'Alba , Paolo Fiorini

分类：机器学习 | 人工智能

2022-08-02

在制造过程中通常检查因果关系，以支持故障调查，进行干预并做出战略决策。行业4.0已获得越来越多的数据，可实现数据驱动的因果发现（CD）。考虑到最近提出的CD方法的数量越来越多，有必要在公开可用的数据集上引入严格的基准测试程序，因为它们代表了公平比较和验证不同方法的基础。这项工作在连续制造过程中介绍了两个用于CD的新型公共数据集。第一个数据集使用著名的田纳西州伊士曼模拟器进行故障检测和过程控制。第二个数据集是从超级加工的食品制造厂中提取的，其中包括对该工厂的描述以及多个地面真相。这些数据集用于基于不同的指标提出基准测试程序，并对多种CD算法进行了评估。这项工作允许在现实条件下测试CD方法，从而为特定目标应用程序选择最合适的方法。数据集可在以下链接中找到：https：//github.com/giovannimen

translated by 谷歌翻译

Colonoscopy Navigation using End-to-End Deep Visuomotor Control: A User Study

Ameya Pore , Martina Finocchiaro , Diego Dall'Alba , Albert Hernansanz , Gastone Ciuti , Alberto Arezzo , Arianna Menciassi , Alicia Casals , Paolo Fiorini

分类：机器人 | 人工智能

2022-06-30

结肠镜检查的柔性内窥镜由于其固有的复杂性而产生了一些局限性，导致患者不适和缺乏临床医生的直觉。机器人设备和自主控制代表了一种可行的解决方案，以减少内镜医生的工作量和训练时间，同时改善整体程序结果。自主内窥镜控制的先前工作使用启发式政策，将其概括限制在非结构化和高度可变形的结肠环境中，需要频繁进行人类干预。这项工作提出了一种基于图像的内窥镜控制，使用深钢筋学习，称为深度视觉运动控制（DVC），以在结肠道的复杂部分中表现出适应性行为。 DVC学习内窥镜图像与内窥镜的控制信号之间的映射。对20位专家胃肠道内镜医生进行的首次用户研究是为了将其导航性能与使用现实的虚拟模拟器进行比较的DVC策略。结果表明，DVC在几个评估参数上显示出同等的性能，更安全。此外，与最先进的启发式控制政策相比，对20名新手参与者进行了第二次用户研究，以证明人类的监督更容易。对结肠镜检查程序的无缝监督将使干预主义者能够专注于医疗决策，而不是内窥镜的控制问题。

translated by 谷歌翻译

The Undesirable Dependence on Frequency of Gender Bias Metrics Based on Word Embeddings

Francisco Valentini , Germán Rosati , Diego Fernandez Slezak , Edgar Altszyler

分类：自然语言处理 | 人工智能

2023-01-02

Numerous works use word embedding-based metrics to quantify societal biases and stereotypes in texts. Recent studies have found that word embeddings can capture semantic similarity but may be affected by word frequency. In this work we study the effect of frequency when measuring female vs. male gender bias with word embedding-based bias quantification methods. We find that Skip-gram with negative sampling and GloVe tend to detect male bias in high frequency words, while GloVe tends to return female bias in low frequency words. We show these behaviors still exist when words are randomly shuffled. This proves that the frequency-based effect observed in unshuffled corpora stems from properties of the metric rather than from word associations. The effect is spurious and problematic since bias metrics should depend exclusively on word co-occurrences and not individual word frequencies. Finally, we compare these results with the ones obtained with an alternative metric based on Pointwise Mutual Information. We find that this metric does not show a clear dependence on frequency, even though it is slightly skewed towards male bias across all frequencies.

translated by 谷歌翻译

Automatic Text Simplification of News Articles in the Context of Public Broadcasting

Diego Maupomé , Fanny Rancourt , Thomas Soulas , Alexandre Lachance , Marie-Jean Meurs , Desislava Aleksandrova , Olivier Brochu Dufour , Igor Pontes , Rémi Cardon , Michel Simard

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-26

This report summarizes the work carried out by the authors during the Twelfth Montreal Industrial Problem Solving Workshop, held at Universit\'e de Montr\'eal in August 2022. The team tackled a problem submitted by CBC/Radio-Canada on the theme of Automatic Text Simplification (ATS).

translated by 谷歌翻译

Feature Acquisition using Monte Carlo Tree Search

Sungsoo Lim , Diego Klabjan , Mark Shapiro

分类：机器学习

2022-12-21

Feature acquisition algorithms address the problem of acquiring informative features while balancing the costs of acquisition to improve the learning performances of ML models. Previous approaches have focused on calculating the expected utility values of features to determine the acquisition sequences. Other approaches formulated the problem as a Markov Decision Process (MDP) and applied reinforcement learning based algorithms. In comparison to previous approaches, we focus on 1) formulating the feature acquisition problem as a MDP and applying Monte Carlo Tree Search, 2) calculating the intermediary rewards for each acquisition step based on model improvements and acquisition costs and 3) simultaneously optimizing model improvement and acquisition costs with multi-objective Monte Carlo Tree Search. With Proximal Policy Optimization and Deep Q-Network algorithms as benchmark, we show the effectiveness of our proposed approach with experimental study.

translated by 谷歌翻译

Robust and Resource-efficient Machine Learning Aided Viewport Prediction in Virtual Reality

Yuang Jiang , Konstantinos Poularakis , Diego Kiedanski , Sastry Kompella , Leandros Tassiulas

分类：计算机视觉 | 人工智能

2022-12-20

360-degree panoramic videos have gained considerable attention in recent years due to the rapid development of head-mounted displays (HMDs) and panoramic cameras. One major problem in streaming panoramic videos is that panoramic videos are much larger in size compared to traditional ones. Moreover, the user devices are often in a wireless environment, with limited battery, computation power, and bandwidth. To reduce resource consumption, researchers have proposed ways to predict the users' viewports so that only part of the entire video needs to be transmitted from the server. However, the robustness of such prediction approaches has been overlooked in the literature: it is usually assumed that only a few models, pre-trained on past users' experiences, are applied for prediction to all users. We observe that those pre-trained models can perform poorly for some users because they might have drastically different behaviors from the majority, and the pre-trained models cannot capture the features in unseen videos. In this work, we propose a novel meta learning based viewport prediction paradigm to alleviate the worst prediction performance and ensure the robustness of viewport prediction. This paradigm uses two machine learning models, where the first model predicts the viewing direction, and the second model predicts the minimum video prefetch size that can include the actual viewport. We first train two meta models so that they are sensitive to new training data, and then quickly adapt them to users while they are watching the videos. Evaluation results reveal that the meta models can adapt quickly to each user, and can significantly increase the prediction accuracy, especially for the worst-performing predictions.

translated by 谷歌翻译

Quotations, Coreference Resolution, and Sentiment Annotations in Croatian News Articles: An Exploratory Study

Jelena Sarajlić , Gaurish Thakkar , Diego Alves , Nives Mikelic Preradović

分类：自然语言处理

2022-12-14

This paper presents a corpus annotated for the task of direct-speech extraction in Croatian. The paper focuses on the annotation of the quotation, co-reference resolution, and sentiment annotation in SETimes news corpus in Croatian and on the analysis of its language-specific differences compared to English. From this, a list of the phenomena that require special attention when performing these annotations is derived. The generated corpus with quotation features annotations can be used for multiple tasks in the field of Natural Language Processing.

translated by 谷歌翻译

Building Multilingual Corpora for a Complex Named Entity Recognition and Classification Hierarchy using Wikipedia and DBpedia

Diego Alves , Gaurish Thakkar , Gabriel Amaral , Tin Kuculo , Marko Tadić

分类：自然语言处理

2022-12-14

With the ever-growing popularity of the field of NLP, the demand for datasets in low resourced-languages follows suit. Following a previously established framework, in this paper, we present the UNER dataset, a multilingual and hierarchical parallel corpus annotated for named-entities. We describe in detail the developed procedure necessary to create this type of dataset in any language available on Wikipedia with DBpedia information. The three-step procedure extracts entities from Wikipedia articles, links them to DBpedia, and maps the DBpedia sets of classes to the UNER labels. This is followed by a post-processing procedure that significantly increases the number of identified entities in the final results. The paper concludes with a statistical and qualitative analysis of the resulting dataset.

translated by 谷歌翻译

Building and Evaluating Universal Named-Entity Recognition English corpus

Diego Alves , Gaurish Thakkar , Marko Tadić

分类：自然语言处理

2022-12-14

This article presents the application of the Universal Named Entity framework to generate automatically annotated corpora. By using a workflow that extracts Wikipedia data and meta-data and DBpedia information, we generated an English dataset which is described and evaluated. Furthermore, we conducted a set of experiments to improve the annotations in terms of precision, recall, and F1-measure. The final dataset is available and the established workflow can be applied to any language with existing Wikipedia and DBpedia. As part of future research, we intend to continue improving the annotation process and extend it to other languages.

translated by 谷歌翻译

Can Ensembling Pre-processing Algorithms Lead to Better Machine Learning Fairness?

Khaled Badran , Pierre-Olivier Côté , Amanda Kolopanis , Rached Bouchoucha , Antonio Collante , Diego Elias Costa , Emad Shihab , Foutse Khomh

分类：机器学习 | 人工智能

2022-12-05

As machine learning (ML) systems get adopted in more critical areas, it has become increasingly crucial to address the bias that could occur in these systems. Several fairness pre-processing algorithms are available to alleviate implicit biases during model training. These algorithms employ different concepts of fairness, often leading to conflicting strategies with consequential trade-offs between fairness and accuracy. In this work, we evaluate three popular fairness pre-processing algorithms and investigate the potential for combining all algorithms into a more robust pre-processing ensemble. We report on lessons learned that can help practitioners better select fairness algorithms for their models.

translated by 谷歌翻译